# generate object x (no output):
x <- 5
# display log(x)
log(x)[1] 1.609438
Session 2
In R, everything is an object.
Objects have a name that is assigned with <- (recommended) or =.
Names have to start with a letter and include only letters, numbers, and characters such as “.” and “_”.
R is case sensitive: \(\Rightarrow Name\neq name\)!
Objects can store vectors, matrices, lists, data frames, functions…
vec <- c(value1, value2, value3).rep().0 and variance 1. Store your results in the object norm.vec.norm.vec.rep() to repeat each element of norm.vec 3 times. Store the result in the object norm.vec.rep.mean(norm.vec.rep^2) equal to mean(norm.vec.rep)^2?TRUE or FALSE.if(condition is TRUE){do this}else{do that}.==, different by != or compare them with < and >.& and “OR” |& and | are then applied element-wise.""or ''."1" to the numeric 1."1" and "2".as.numeric("1") and as.numeric("2"). What happened?1 and the character "2". Of which type are the elements of the vector?names() command.avg_temp <- c(Maastricht = 14.2, Amsterdam = 13.4, Rotterdam = 13.7)
print(avg_temp) # names appear on top of elementsMaastricht Amsterdam Rotterdam
14.2 13.4 13.7
[1] "Maastricht" "Amsterdam" "Rotterdam"
# Alternatively, we can define data and names separately
temp <- c(14.2, 13.4, 13.7)
names(temp) <- cities # recall that we have defined "cities" earlier!
print(temp)Maastricht Amsterdam Rotterdam
14.2 13.4 13.7
[-k], we can get the vector except for the \(k\)-th element.NA (“not available”) indicates missing values.NA yields NA.NaN(“not a number”) indicates the result of a mathematically undefined operation.m rows directly using matrix(vector,nrow=m).rbind(v1,v2,...) or by column by cbind(v1,v2,...).rownames() and colnames().[rownumber,colnumber], the k-th row by [k,] and the k-th column by [,k].rbind() or cbind() to combine them into the identity matrix.diag(3) in your console.
3. Get the data for April and May by - including only the first and second row - excluding the third row - using the names
R can do matrix “regular” algebra, and even lets you do operations that are not well-defined mathematically.
t(A) is the transpose of the matrix A.
# define matrix containing normal data
data.vec <- rnorm(9, mean = 0, sd = 1)
A <- matrix(data.vec, nrow = 3)
A # return A [,1] [,2] [,3]
[1,] 0.2716964 0.2826514 0.7622134
[2,] -1.5595010 -1.1265315 -0.2985537
[3,] -0.3099461 -1.3427173 -1.3471083
[,1] [,2] [,3]
[1,] 0.2716964 -1.5595010 -0.3099461
[2,] 0.2826514 -1.1265315 -1.3427173
[3,] 0.7622134 -0.2985537 -1.3471083
solve(A) returns the inverse of an invertible matrix.*does element-wise multiplication.%*% does matrix multiplication . [,1] [,2] [,3]
[1,] 0.2847033 -0.170459 0.5537921
[2,] 2.9389180 0.137170 0.3102885
[3,] -0.5074700 -0.349271 -0.1702998
[,1] [,2] [,3]
[1,] 1.000000e+00 -5.551115e-17 2.775558e-17
[2,] 0.000000e+00 1.000000e+00 -3.469447e-17
[3,] -4.440892e-16 -5.551115e-17 1.000000e+00
list is a generic collection of objects.mylist<- list(name1=component1, name2=component2,...).names(mylist).$ (dollar sign) operator, e.g., mylist$name1, or by position with [[]].data frames are simply data sets in R terminology.data files can contain multiple data sets.data.frame() or transform a matrix mat into a data frame by as.data.frame(mat).lm() for regressions) need a data frame as input (see later sessions).# generate a data frame
ID <- 1:4
hourly_wage <- rnorm(n = 4, mean = 20, sd = 1) # create 4 draws from N(20,1)
city <- c("Maastricht", "Eindhoven", "Amsterdam", NA)
dats <- data.frame(ID, hourly_wage, city) # add new variable
dats ID hourly_wage city
1 1 20.18216 Maastricht
2 2 20.37298 Eindhoven
3 3 20.68820 Amsterdam
4 4 20.96485 <NA>
$ operator.$ operator.View() opens a data-viewer. Very useful (but difficult to demonstrate on these slides).subset(data_frame,condition), we can easily get a subset of the original data frame where condition is TRUE.ID that contains the sequence 1,2,…,100.income that contains 100 random draws from N(10,1).female that is 1 for ID=1,...,50 and 0 otherwise. (hint: you can achieve this by using rep() twice and combining two vectors with c())my_df.View(my_df)sub_my_df that contains only individuals with income larger than 10.n <- 100 # set the sample size
X <- rnorm(n, mean = 1, sd = 2)# define the observed covariate X
epsilon <- rnorm(n, mean = 0, sd = 1) # define the model error
beta0 <- 1 # define true intercept
beta1 <- 2 # define true slope
Y <- beta0 + beta1 * X + epsilon # generate Y according to a linear model
# recall the formula in a bivariate model
beta1.hat <- cov(X,Y) / var(X)
beta0.hat <- mean(Y) - beta1.hat * mean(X)
# print estimators
beta0.hat[1] 0.9852093
[1] 1.974165
X. What is the effect on beta1.hat?epsilon. What is the effect on beta0.hat?X and epsilon. What is the effect?